Generation of insights based on automated document analysis

ABSTRACT

Certain aspects of the present disclosure provide techniques for generating an insight, comprising: receiving a request to generate an insight for a document from a user associated with the document; receiving the document; generating a tree based on the document; parsing the tree based on a set of rules to generate a set of results associated with the tree; mapping the set of results to a subset of insight elements of a set of insight elements associated with a set of trees including the tree; providing, to a first machine-learning model, the subset of insight elements; receiving, from the first machine-learning model, an insight for the document based on the subset of insight elements; and returning the insight for the document to the user.

INTRODUCTION

Aspects of the present disclosure relate to generating insights for anentity based on automated document analysis.

Organizations often create complex documents regarding the status oftheir operations. Such documents may imply latent conditions of theorganization that are not obvious by just viewing the documentsthemselves. Conventionally, organizations have relied on experts to helpderive insights from these otherwise dense documents, however, suchpractices are expensive, time-consuming, and subjective (and thus notconsistently repeatable).

Accordingly, there is a need for methods for generating insights basedon automated analysis of documents.

BRIEF SUMMARY

Certain embodiments provide a method for generating an insight. Themethod generally includes receiving a request to generate an insight fora document from a user associated with the document; receiving thedocument; generating a tree based on the document; parsing the treebased on a set of rules to generate a set of results associated with thetree; mapping the set of results to a subset of insight elements of aset of insight elements associated with a set of trees including thetree; providing, to a first machine-learning model, the subset ofinsight elements; receiving, from the first machine-learning model, aninsight for the document based on the subset of insight elements; andreturning the insight for the document to the user.

Other embodiments provide processing systems configured to perform theaforementioned method as well as those described herein; non-transitory,computer-readable media comprising instructions that, when executed byone or more processors of a processing system, cause the processingsystem to perform the aforementioned method as well as those describedherein; a computer program product embodied on a computer-readablestorage medium comprising code for performing the aforementioned methodas well as those further described herein; and a processing systemcomprising means for performing the aforementioned method as well asthose further described herein.

The following description and the related drawings set forth in detailcertain illustrative features of one or more embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

The appended figures depict certain aspects of the one or moreembodiments and are therefore not to be considered limiting of the scopeof this disclosure.

FIG. 1 depicts an example computing environment for generating automatedinsights.

FIGS. 2A and 2B depict an example document and example tree derived fromthe document.

FIG. 3 depicts an example mapping of results to a set of insightelements used to generate an insight.

FIG. 4 depicts an example process of calculating a health score for adocument based on a set of insight elements.

FIG. 5 depicts an example analysis containing an insight and a healthscore based on a document.

FIG. 6 depicts an example process flow for automatically generating aninsight and a health score for a document.

FIG. 7 depicts an example method of generating an insight using atrained model.

FIG. 8 depicts an example processing device that may be configured toperform the methods described herein.

To facilitate understanding, identical reference numerals have beenused, where possible, to designate identical elements that are common tothe drawings. It is contemplated that elements and features of oneembodiment may be beneficially incorporated in other embodiments withoutfurther recitation.

DETAILED DESCRIPTION

Aspects of the present disclosure provide apparatuses, methods,processing systems, and computer-readable mediums for generatinginsights based on automated analysis of documents.

Generating insights about a particular organization's operations isgenerally challenging. Organizations generate massive volumes of data ofmany different sorts. Conventionally, an organization may have tasked anexpert with sorting through the data to generate feedback on theorganization's operations, but these methods have proven unreliable forseveral reasons. First, human experts are generally experts in a limiteddomain, and are thus unable to process a wide variety of data,especially where it is outside of their technical domain. Second, humanexperts are inherently subjective, and two different experts may oftencome to two different conclusions based on the same underlying data.Third, human experts are slow at processing large volumes of data,especially when presented in disparate formats and where the meaningfuland not meaningful data is not easily distinguishable. Simply put, humanexperts are not able to perform the sort of complex automated analysisin their minds given the inherently challenging analytical task ofgenerating insights from huge volumes of structured and unstructuredorganization data.

In order to overcome the challenges of conventional methods, embodimentsdescribed herein describe an automated approach to processingorganization data to generate meaningful insights about theorganization's operations. In various embodiments, data (e.g.,documents) are transformed to a structured format, such as a tree-likeobject, so that the data are more amenable to automatic processing. Thistransformation allows for automated matching of data elements (e.g.,leaves of the tree-like object) to insight elements based on rules. Theinsight elements may then be combined by a trained machine learningmodel to form an insight for delivery to a user.

Beneficially, insights generated in this manner represent objectiveintelligence regarding the operation of an organization based on manydifferent types of input data. Unlike conventional methods, the insightgeneration methods described herein are based on trained artificialintelligence models that can consider a wider variety of data moreconsistently, and without subjective bias or inconsistency. The methodsdescribed herein thus represent a technical solution to an extanttechnical problem in the related art in that they provide a scalable,repeatable, and accurate way to assess data regarding an organizationwithout the shortcomings of traditional methods.

Example Computing Environment for Generating Automated Insights

FIG. 1 depicts an example computing environment 100 for generatingautomated insights.

Generally, an insight may comprise a textual description regarding oneor more aspects of an organization's operations. For example, if theorganization is a business, an insight may be related to changes incustomer activity, revenue, costs, and the like. If the organization isa government entity, an insight may likewise relate to revenues, costs,head counts, budgets, and the like. If the organization is anindividual, an insight may again relate to revenues, costs, assets, andpossibly personal information. As described further herein, insights maybe generated by trained machine learning models based on insightelements matched to data provided for analysis. Each insight element maybe associated with a particular topic, such as “Customer”, “Income”, or“Assets”.

As illustrated, the computing environment 100 includes a server 102interacting with one or more computing devices, such as computingdevices 140, 142, and 144. In this example, the server 102 includes ahealth score component 104, an insight component 106, a documentanalyzer 108, a ranking component 138, and databases 110. The server maycommunicate one or more requests, files, records, or data with thecomputing devices 140, 142, and 144.

Computing devices 140, 142, and 144 can each include, for example, adesktop computer, a laptop computer, a tablet computer, a smartphone, asmart wearable device, a virtual machine, and other types of computingdevices. The computing devices 140, 142, and 144 may contain one or morefiles or records that may be sent to and processed by the server 102.The computing devices 140, 142, and 144 may also receive and store oneor more files or records that have been processed or generated by theserver 102. In some cases, one or more computing devices may processdata and create records before sending those records to the server 102.

As shown in this example, the server 102 includes document analyzer 108,which further includes a conversion component 120, a parsing component122, and a mapping component 124. Generally, document analyzer 108analyzes one or more documents received from a computing device, or, insome cases, documents that are already stored on server 102 (e.g., indocuments 126). Before receiving documents, a set of insight elementsand a set of rules may be received from a computing device, such ascomputing device 142, and stored on the server (e.g., at insights 130and rules 128).

In one embodiment, the set of rules and set of insight elements arereceived from a computing device associated with one or more experts(e.g., computing device 142). In another embodiment, the set of rulesand set of insight elements are defined on server 102 and stored inrules 128 and insights 130. Further, rules 128 and insights 130 may beupdated with new rules and new insights.

In this example, document analyzer 108 converts one or more documentsinto a tree-like object (referred to herein as a “tree”) with conversioncomponent 120 so that information in the documents may be extracted fromthe tree and later manipulated into results that can be mapped toinsight elements. In one embodiment, the tree may be embodied in a JavaScript Object Notation (JSON) object or an extensible markup language(XML) file, but other types of structured data files are possible. Insome examples, each document and its related tree is associated with anorganization, and generally an organization may be associated with manydocuments and trees.

A tree converted from a document may be stored in trees 136 afterconversion. In various examples described herein, the documents may behistorical documents, which may be used for training data, or currentdocuments, which may be used for insight generation.

Document analyzer 108 parses a tree generated from a document forspecific data related to rules using parsing component 122. Rules may becoded expressions that, when executed, may perform one or more functionswith regards to the leaves of a tree to generate a result. For example,certain rules may be used to extract values or text from a tree as aresult. Additionally, certain rules may use extracted values or textfrom a tree as a result and perform a calculation to transform theresult into a different result. Further, certain rules may compare ormanipulate certain results associated with a first tree in relation toresults associated with a second tree in order to obtain a new resultfor first tree.

Document analyzer 108 maps the results (e.g., created by parsingcomponent 122) to insight elements stored on server 102 by using mappingcomponent 124. Each result may be mapped to one or more insight elementsbased on a value, text, or pattern of the result and the associatedrule. Generally, each insight element will contain a textual descriptionexplaining the value, text, or pattern of the result. The documentanalyzer may collect all insight elements associated with the results ofparsing the tree to create a set of insight elements. In someembodiments, the set of insight elements associated with that tree willbe a subset of all insight elements stored in the server 102 (e.g., atinsights 130).

The server 102 further includes insight component 106, which generatesan insight associated with a document based on a set of insightelements. In this example, insight component further includes insightgeneration model 116 that can use natural language processing (NLP)techniques, such as a Bidirectional Encoder Representations fromTransformers (BERT) model, a Generative Pre-trained Transformer 3(GPT-3) model, or a GPT-2 model to generate an insight from a set ofinsight elements.

In this example, the server 102 further includes ranking component 138,which determines a specific order of insight elements in the insight byranking each insight element. Ranking component 138 may rank insightsaccording to one or more criteria received from a computing device, suchas computing device 140 or 142. Probabilities associated with eachinsight element may be used by vectorizer 118 in order to create avector of probabilities for each insight element, which may be input toa ranking model 146 in order to generate a ranking score for eachinsight element that may be used by ranking component 138 to order theinsight elements.

In various embodiments, the ranking model 146 may be a regression model,such as an XGBoost regressor, a linear regression model, or a decisiontree regression model that may receive the vector as input and output ascore to be ranked. In some embodiments, the insight component 106 mayreceive and use the order when an insight is generated. The rankingscores for each insight element in a specific set of insight elements orall insight elements in insights 130 may be stored in rankings 132,which may update ranking scores stored in rankings 132 if new insightsare created and ranked.

In this example, the server 102 further includes health score component104, which may generate a health score from a set of insight elements.Health score component 104 further includes vectorizer 112, which maycreate a vector based on a set of insight elements associated with adocument, and a health score model 114, which may generate a healthscore based on the vector from vectorizer 112. In this example, thehealth score model 114 may be a regression model, such as an XGBoostregressor, a decision tree regression model, or a linear regressionmodel. In some embodiments, the server 102 may train the health scoremodel 114. The health scores generated by health score component may bestored in scores 134 of databases 110.

In this example, the server 102 further includes databases 110, whichmay store various databases such as documents 126, rules 128, insights130, rankings 132, scores 134, and trees 136. As described above,various components of the server 102 can access the various databases indatabases 110.

Example Tree Object Based on Document

FIGS. 2A and 2B depict a document 200 associated with an organization(“Young Business, Inc.” in this example) and a tree 204 based ondocument 200. In one example, document 200 is converted to tree 204 byserver 102 of FIG. 1.

Generally, document 200 may be any sort of data file, including textdocuments, reports, or spreadsheets, as well as other structured andunstructured data types. In some examples, document 200 may be a type ofstandardized report for an organization, such as an income report, assetreport, profit and loss report, and the like. Beneficially, suchdocuments generally follow standard conventions from organization toorganization, and so may be easier to convert to structured objectcontainers, such as tree objects.

As illustrated, the document 200 includes information about anorganization in one or more fields of the document 200. The information,shown in textual elements 202 a-202 q, may concern particular aspectsabout a certain organization, such as the name of the organization(e.g., as shown in textual element 202 a), the gross income of theorganization for a given time period (e.g., as shown in textual element202 b), or the taxable income of an organization for a given time period(e.g., as shown in textual element 202 c). While the textual elements202 a-202 q are shown as exemplary textual elements containinginformation about the organization, those textual elements areexemplary, and more textual elements with different types of informationmay be used. In some embodiments, the document may have other elementsthat are not textual, such as images. The textual elements may includeone or more labels and values, such as “expenses” and “$40,000”.

The document 200 may be a physical form, an electronic form, or anothertype of form. If the document 200 is an electronic form, the server mayanalyze the document and convert the document 200 into tree 204. If thedocument 200 is a physical form, the server may perform text recognitionon an image of the document 200 in order to extract the textualelements, and then convert the document 200 into tree 204 by using theimage of the document and the extracted textual elements. In someexamples, the image of the document may be downloaded by the server orsent to the server by a computing device (such as computing device 140,142, or 144 of FIG. 1).

After the server retrieves or extracts the textual elements 202 a-202 q,the server (e.g., by conversion component 120 of FIG. 1) converts thedocument 200 into tree 204, which includes leaves 206 a-206 q that eachcorrespond to a textual element. In one embodiment, the server may use arecursive function to convert the document 200 into tree 204. In someembodiments, the tree may store as a structured object file, such asJSON object or an XML file.

As depicted in FIG. 2B, the leaves 206 a-206 q are structured as one ormore parent nodes and child nodes, where each parent node is related tothe child node. As shown, the parent node and child node are connectedby a line (e.g., leaf 206 b is connected to leaf 206 c). Each set ofparent and child nodes may be a part of a branch of parent and childnodes, where each node in the branch is associated with certain metadataelements (e.g., metadata elements 210 a-210 f and 212 a-212 f)containing information about the leaves in the branch.

Each leaf 206 a-206 q includes one or more values and/or text associatedwith a textual element of document 200. For example, leaf 206 a mayinclude the value “$100,000” which is associated with the label “GrossIncome” and value “$100,000” in textual element 202 a. In someembodiments, a leaf may include text instead of a numerical value. Inyet another embodiment, a leaf may include both text and a numericalvalue. Some of textual elements 202 a-202 q in document 200 may beconverted into numerical values that are contained in one of leaves 206a-206 q. For example, a “Yes” may be converted into the number “1” whilea “No” may be converted into the number “0”.

In this example, each leaf 206 a-206 q further includes metadataassociated with the respective textual element, such as metadataelements 208 a-208 q associated with certain result topics, which may beused by the server when applying rules to specific results. Metadataelements 208 a-208 q are displayed as abbreviations of topics related toa value or text of 206 a-q (e.g., “GI” of 208 b relates to “GrossIncome” and “RC” of 208 i relates to “Returning Customers”). Otherinformation may be contained in leaves 206 a-206 q that has a particularpattern, such as a dates (e.g., in leaf 206 q). Further, informationregarding coordinate locations of fields in the document 200 may also becontained in leaves 206 a-206 q in order to determine if a tree wasproperly converted. While the leaves 206 a-206 q are shown withexemplary values, texts, and patterns, the information contained in theleaves 206 a-206 q is exemplary and other information may be contained.Further, in some embodiments, leaves may not contain metadata elements208 a-208 q.

The server parses tree 204 (e.g., by using parsing component 122 ofFIG. 1) in order to further acquire or compare results by applying rules(e.g., from rules 128 of FIG. 1) to one or more of the values, texts,and patterns stored in leaves 206 a-206 q. Rules may contain one or moreexpressions and functions in order to manipulate the values and textfrom the tree to return new values and texts.

For example, the server may calculate a value for an amount of newcustomers an organization received in a certain time period (such as 350new customers) by subtracting the value contained in leaf 206 i (600,representing returning customers) from the value in leaf 206 f (950),representing total customers). Rules may also verify information orextract the information from the leaf. Further, parsing a tree may notbe limited to the values, descriptions, and patterns of just one tree,for example, the parsing component may compare values between one ormore trees in order to determine new values based on the values of theone or more trees. The server may use metadata, such as metadataelements 208 a-208 q associated with certain result topics in order toidentify the values necessary to apply rules to the results from thetree (e.g., determining a profit per customer of a tree by usingmetadata elements 208 f and 208 e to identify and apply a rule to thevalues of leaves 206 f and 206 e) or apply rules comparing ormanipulating values of the tree to values of other trees (e.g., usingmetadata element 208 b to compare the gross income of one time period,such as Nov. 1, 2020-Feb. 1, 2021 identified by metadata element 212 a,to the gross income of another time period of another document using thesame result topic in another tree that is associated with the otherdocument).

All pieces of information obtained directly from the tree 204, as wellas the information determined by applying the rules to the tree areconsidered results from parsing tree 204 that may be mapped to one ormore insight elements, as described below with respect to FIG. 3.

Example Mapping of Parsing Results to Insight Elements

FIG. 3 depicts an example mapping 300 of results to a set of insightelements 336, such as may be stored on server 102 of FIG. 1.

As illustrated, the example mapping 300 includes insight topics 302-306,rules 308-320 applied to one or more results associated with a tree(e.g., tree 204 of FIG. 2B), and insight elements 322-334. The servercan obtain results from parsing a tree as described with respect to FIG.2B.

In the depicted example mapping 300, insight elements (e.g., 322-334)are mapped to one or more results. For example, the insight topic 302for “Returning Customers” applies one or more related rules 308-312 toone or more related results from parsing a tree. In doing so, one ormore operators may be used to compare results, sum results, divideresults, or perform another operation with respect to the results, whichallow the results to be mapped to some of insight elements 322-334.

For example, for insight topic 302, if the result obtained from the treefor returning customers (e.g., “RC1” in rule 308) during a first timeperiod (e.g., as shown by metadata element 212 e of FIG. 2B) is greaterthan a result obtained from another tree associated with a sameorganization during a different time period for returning customers(e.g., “RC2” in rule 308), the server will map the result for returningcustomers to insight element 322, which has a textual description of“You had more returning customers than last year! Good work, yourbusiness is trending up!”. As a further example, if “RC1” is equal to“RC2”, the server will map the result for returning customers to insightelement 324, which has a textual description of “You had the same amountof returning customers this year as last year, showing your customerretention is stable.”. As yet another example, if “RC1” is less than“RC2”, the server will map the result for returning customers to theinsight element 326, which has a textual description of “You had fewerreturning customers than last year. Unless you have more new customers,you should focus on retaining your customers.”.

Similarly, the server may map other results to separate insights basedon other operators. One or more metadata elements associated with resulttopics (e.g., metadata elements 208 a-208 q of FIG. 2B) may be used inorganizing and applying rules to results, such as TI, E, P, and TC (asshown by metadata elements 208 c, 208 d, 208 e, and 208 f of FIG. 2B)indicating that values for taxable income, expenses, profit, and totalcustomers from leaves of the tree (e.g., 206 c, 206 d, 206 e, and 206 fof FIG. 2B) should be used as shown in rules 314-320. For example,metadata elements 208 c, 208 d, and 208 e indicate that the values ofleaves 206 c, 206 d, and 206 e should be used when applying rule 314,and metadata elements 208 c and 208 d indicate that the value of leaves206 c and 206 d should be used when applying rule 316. For furtherexample, the metadata elements 208 c, 208 d, and 208 f may indicate thatthe values of leave 206 c, 206 d, and 206 f should be used when applyingrules 318 and 320.

While insight topics 302-306 are shown, they are only exemplary insighttopics, and other insight topics may be used. Further, while rules308-320 are shown, they are only exemplary, and many more rules may beused by server. Even further, while only two or three rules are shownfor each insight topic for simplicity, it should be noted that many morerules may be available for each insight topic. Further yet, whileinsight elements 322-334 are shown, they are only exemplary insightelements, and more insight elements may be used by server.

In this example, applying the rules 308-320 allows the results to bemapped to insight elements 326, 328, and 332 to create a set of insightelements 336 that will be later delivered to insight generation model116.

Further, in this example, the set of insight elements 336 is firstdelivered to ranking model 146 to order the insight elements 326, 328,and 336 in the set before sending the set to the insight generationmodel. Rankings for the insight elements may be determined based onparticular probabilities determined from user data associated with eachinsight element. Examples of such probabilities may be associated withhow users interact with insights including the insight element, such asa probability of being clicked for more information, a probability ofbeing liked if asked, or a probability of being agreed as accurate ifasked. The ranking model may be a regression model, such as an XGBoostregression model, may be used to calculate a ranking score for eachinsight based on the probabilities associated with each insight element.The ranking scores for each insight element may be used to order theinsight elements 326, 328, and 332 in the set of insight elements.

Additionally, the ranking model 146 may use other information associatedwith the insight elements to produce the ranking scores. In oneembodiment, the ranking score may be further based on a populationcoverage percentage and a category. For example, each insight may beassociated with a particular population coverage, meaning that theinsight has been provided to a certain percentage of users because aresult may be mapped to that insight element for fifty percent (50%) ofall documents. The regression model may also use that populationcoverage in determining the ranking score. Additionally, a category,such as good, bad, or neutral may also be assigned to each insight. Eachcategory may be associated with a numerical value, which may also beused by the regression model in determining a score for each insightelement.

In this depicted example, insight generation model 116 receives the setof insight elements 336 in the order determined by the ranking model andmay generate an insight 338 based on the set of insight elements. Theserver may then return the insight 338 to a computing device (e.g.,computing device 140 of FIG. 1) associated with a user who requested aninsight for the document.

In some cases, the insight generation model 116 may be trained on theserver. The insight generation model 116 may be trained to generate aninsight when receiving a set of insight elements associated with a treeas input by using Natural Language Processing (NLP) methods. In oneembodiment, the insight generation model 116 may be a BERT model, aGPT-2 model, or a GPT-3 model. The insight generation model 116 may betrained to output an insight when receiving a set of insight elements asinput by using known insights associated with sets of insight elementsas training data. In some cases, training the insight generation model116 further consists of removing certain words of the known insight andallowing the insight generation model 116 to fill in the best guess ofthe removed words during a training instance.

In some embodiments, the insight generation model 116 may be trained tooutput an insight where the wording of each insight element remainsunchanged. In other cases, the insight generation model 116 may use NLPto construct the insight by combining the set of insight elements 336into an insight, including by adding words and punctuation.Additionally, the insight generation model 116 may combine two or moreinsight elements that have similar insight topics by amending parts ofeach insight element to fit together. Even further, the insightgeneration model 116 may summarize the entirety of the set of insightelements 336 to create a more compact insight.

Example Vector for Set of Insight Elements and Health Score for Document

FIG. 4 depicts an example processing 400 of vector 402 associated with aset of insight elements, such as set of insight elements 336 of FIG. 3,to generate a health score 410 for an document based on the set ofinsight elements.

As described with respect to FIG. 1, one or more documents may bereceived by a server, such as server 102. The documents may be convertedinto trees containing information about the documents, which may beparsed in order to obtain one or more results associated with each tree.The results for each tree may be mapped to a set of insight elements forthe tree.

As depicted in this example, vector 402 is based on the set of insightelements 336 out of all insight elements 322-334, as described withrespect to FIG. 3. As illustrated, vector 402 has a dimension for eachpossible insight element that could have been mapped (e.g., onedimension for each of insight elements 322-334). Further, dimensionsassociated with insight elements present in the set of insight elementsfor the tree (e.g., insight elements 326, 328, and 332) have a “1”value, while dimensions associated with insight elements that are notpresent in the set of insight elements for the tree (e.g., insightelements 322-324, 330, and 334) have a “0” value. While seven dimensionsare shown, it should be noted that seven dimensions is exemplary, andmore or fewer dimensions may be used, depending on how many possibleinsights may be mapped.

As depicted, health score model 114 receives vector 402 and generates ahealth score 410 for a document based on the vector 402. In oneembodiment, the health score model 114 may be an XGBoost regressionmodel, a linear regression model, or a decision-tree regression model.The health score model 114 generates the health score based on whichinsights are present in the set of insight elements as well as insightsthat are not present in the set of insight elements associated with thevector 402.

The health score for a document may comprise a score in between “0” and“100” representing a probability that an associated organization will bein operation during a specific time period in the future. For example,the health score may be “80” if health score model 114 predicts theprobability that an organization associated with the document has an 80%chance of still being in operation after six months from receiving thedocument. In some embodiments, the health score may further beassociated with other indications of whether the organization will beoperating during the specific time period, such as a color (e.g.,“green”) representing the likelihood that the organization will beoperating. Further, a health score may be associated with one or morespecific words indicating the likelihood of whether the organizationwill be operating during the specific time period, such as “unhealthy”,“healthy”, or “thriving”.

For example, health score 410 has a score of 78, indicating a 78% chanceof the associated organization still operating during the specifiedfuture time period. Additionally, health color 412 may be associatedwith a certain range of health scores (e.g., green may be associatedwith a range of 75-100). Further, health indicator 414 may also beassociated with a certain range of health scores (e.g., “thriving” maybe associated with a range of 70-100). These various means ofrepresenting the health score may be useful for various user interfacesin which it is depicted, such as on computers, mobile devices, and thelike.

After generating the health score 410 based off the vector 402, thehealth score model 114 may return the health score 410 along with anyassociated information to the server. The server may later return thehealth score 410 along with any associated information to a user inresponse to a request for the health score.

In some embodiments, the health score model 114 may be trained on theserver by labelling sets of insight elements associated with trees withhistorical health scores. The server may create a vector based on eachset of insight elements associated with each tree and the entire set ofinsight elements stored on the server (e.g., in insights 130 of FIG. 1).Similarly to vector 402, each vector used to train the health scoremodel 114 may be binary, where a dimension for an insight elementpresent in the set of insight elements may be labeled with a value of“1”, and where a dimension for an insight element not present in the setof insight elements may be labeled with a value of “0”.

The historical health score may indicate if an organization associatedwith the certain document and tree was still in operation during acertain time period after the document was received, such as six monthsafter receiving the document. In some cases, the historical health scorefor each organization associated with the documents may be received froma computing device, such as computing device 144. In other cases, theserver may access a network to determine if each organization was stilloperating during the certain time period.

In one embodiment, the historical health scores, which may be used aslabels for training health score model 114, may be binary, where ahistorical health score is “0” if the organization associated with thedocument was not operating during the certain time period and thehistorical health score is “1” if the organization associated with thedocument was operating during the certain time period.

In some embodiments, the historical health score may not be binary, andmay represent a probability of if the organization associated with thedocument was in operation within the time period. For example, thehistorical health score may be “0.25” if it cannot be determined thatthe organization was operating during the certain time period, butcircumstances surrounding the organization indicate that theorganization was likely not operating. As another example, thehistorical health score may be “0.75” if, again, it cannot be determinedthat the organization was operating during the certain time, butcircumstances surrounding the organization indicate that theorganization was likely operating. Additionally, while an example of sixmonths prior to receiving the document is given for the certain timeperiod, many different time periods may be used.

Example Analysis Containing Insight and Health Score

FIG. 5 shows example analysis 500 containing an insight 338 and healthscore 410 based on a document (e.g., document 200 of FIG. 2A) that isreturned to a user based on a request to generate an insight and ahealth score. In some cases, the analysis may be based on multipledocuments.

As described with respect to FIGS. 3-4, a set of insight elements, suchas set of insight elements 336 of FIG. 3, are mapped from results ofparsing a tree, such as tree 204 of FIG. 2B. The set of insight elementsare input to a health score model and/or an insight generation model,such as health score model 114 of FIGS. 1 and 4 and insight generationmodel 116 of FIGS. 1 and 3, in order to generate a health score 410 andinsight 338. In some cases, the health score model may receive the setof insight elements as a vector representation, such as in FIG. 4.

Analysis 500 displays one or more characteristics of an organizationassociated with the document, such as the name of the organization(e.g., “Young Business, Inc.”), in characteristic portion 502. The oneor more characteristics may be based on information extracted from atree associated with the document.

Analysis 500 further displays insight portion 504 that includes aninsight 338 for a document created from the set of insight elementsassociated with the tree. Insight 338 may be generated by an insightgeneration model based on natural language processing techniques. Inthis depicted example, insight 338 may include one or more sentencescreated based on the textual descriptions of the insight elements of theset of insight elements for the tree. In some cases, insight 338includes words and punctuation generated by the insight generation modelin order to stitch the insight elements together. Additionally, insight338 may include sentences that combine one or more insights. Forexample, insight sentence 508 may be based on two or more insightelements, such as insight elements 326 and 328 of FIG. 3 as well asother possible insight elements that were combined, rearranged, and/orsummarized by insight generation model 116.

In this example, the insight 338 is in the form of a paragraph based onthe textual descriptions of the set of insight elements associated withthe tree. In other embodiments, the insight 338 may be multiple separatesentences or paragraphs constructed from the insight elements andaccompanying descriptions of the rules that mapped the results to theinsight elements. In one embodiment, the insight 338 may be multiplelinks leading to the textual descriptions of insight elements of the setof insight elements and the calculations that led to mapping thoseinsight elements.

In the depicted example, insight 338 further includes a display of oneor more relevant calculations that the server calculated in mapping theset of insight elements that may be accessed by selecting a link 510 inthe insight. In some embodiments, the link 510 may not be present in theinsight 338.

Analysis 500 further includes a health score portion 506 containing ahealth score 410 for an organization associated with the document. Thehealth score 410 may be generated by the health score model (e.g., 114of FIGS. 1 and 4). In some embodiments, health score portion 506 mayfurther include other information associated with the health score 410that is derived from the set of insight elements, such as a colorindicating the health of the organization (e.g., health color 412 ofFIG. 4) or a word indicating the health of the organization (e.g.,health indicator 414 of FIG. 4).

Thus, the analysis 500 may be returned to a user who requested aninsight and a health score based on a document associated with aparticular organization. In some embodiments, analysis 500 may containany of characteristics portion 502, insight portion 504, and healthscore portion 506 alone or in combination.

Example Process of Training Models and Generating Insights and HealthScores

FIG. 6 depicts example process 600 generating an insight 338 and ahealth score 410 for a document 200. In some embodiments, process 600may be performed by a server, such as server 102 of FIG. 1.

At step 605, after receiving a set of documents 602, the set ofdocuments 602 is converted into a set of trees into set of tree trees604 (e.g., with conversion component 120 of FIG. 1). In someembodiments, the documents may be historical documents. Each treeconverted from a document may contain one or more descriptions and oneor more values associated with fields within the document (e.g., inleaves 206 a-206 q of FIG. 2B). The server may use multiple methods ofconverting a document into a tree, such as, but not limited to, usingone or more recursive functions to construct a tree from a document withone or more tabular fields.

At step 610, the set of trees 604 are parsed to generate a set ofresults 606 for each tree of set of trees 604 (e.g., with parsingcomponent 122 of FIG. 1). As described above, the parsing component mayapply rules in order to parse the one or more trees, where the rules areassociated with the set of trees 604 and are received from a computingdevice (e.g., computing device 142 of FIG. 1) or defined on the server.Parsing the one or more trees in set of trees 604 may include retrievingthe one or more descriptions and one or more values associated with thefields within a document and analyzing the descriptions and values. Theone or more descriptions and one or more values may further bemanipulated into a different set of descriptions and values. The parsingcomponent may also compare values between two or more trees in order todetermine new values based on the values of the two or more trees. Eachtree may be associated with an organization, and in some cases, morethan one document may be associated with the same organization.

At step 615, the sets of results 606 are mapped to sets of insightelements 612 (e.g., at mapping component 124 of FIG. 1). The sets ofinsight elements 612 may be determined from a set of insight elements608 that contains all possible insight elements. In some embodiments,each set of insight elements 612 may be a subset of set of insightelements 608. The insight elements may be received from a computingdevice, such as computing device 142 of FIG. 1, or may already be storedon the server (e.g., in insights 130 of FIG. 1).

Each insight element may be related to one or more rules so that aninsight element may be determined based on the results of applying therelated one or more rules. Further, each insight element may contain atextual description of the result in a short explanation. For example,if the parsing component, by executing certain rules, determines that adocument from 2019 shows an organization had one thousand (1,000)customers, and later determines that a document from 2020 shows the sameorganization had two thousand (2,000) customers, the parsing componentmight return a result of 1,000 related to the organization's newcustomers. Afterwards, the mapping component receives and furtherapplies rules to the results to determine a particular insight elementrelated to the result, which may contain a short textual description ofthe result, such as “Your business served double the amount customersthis year than it did last year!” or “Your business served morecustomers than it did last year, so business is trending up.” Further,in some cases, more than one result may be used to determine one insightelement. In some cases, not all results may be mapped to a specificinsight element.

The results of parsing multiple trees or the results of parsing a singletree may be used in order to determine the insight elements for onespecific tree. For example, the mapping component may compare the resultof parsing a first tree to the result of parsing a second tree inmapping one insight for the tree, but may also only use the results ofparsing the first tree in mapping another insight for the tree. In oneembodiment, the document analyzer may only use the results of parsingthe single tree.

At step 620, an insight is created for each set of insight elements 612to create insights 614.

In the depicted example, at steps 625 and 630, the sets of insightelements 612 and the insights 614 are sent to insight generation model116. At step 635, the insight generation model 116 may then be trainedaccording to FIGS. 1 and 3 to generate an insight when receiving a setof insight elements as input. The insight generation model 116 istrained by labelling each set of insight elements 612 with an insight ofinsights 614 and using the pair of the set of insight elements and theinsight as a training instance. The trained insight generation model 116may receive a set of insight elements as input and determine an optimalway to combine, amend, or summarize the textual explanations for eachinsight element to create an understandable and compact insight.

In some embodiments, the server may limit the amount or topics ofinsights that may be present in the insight for a document. For example,the server may only use a specified amount of insight elements togenerate input to insight generation model 116. For example, a specifiedamount of insight elements may be specified by a user. In someembodiments, the insight generation model 116 may only include insightelements associated with one or more particular insight topics (e.g.,“customers” of “income”) that may be related to metadata in a tree in agenerated insight. The one or more types of insight elements may bespecified by a user.

At step 640, a historical health score 616 for each set of insightelements 612 is determined based on whether or not an organizationassociated with a corresponding tree was operating during a certain timeperiod.

At steps 645 and 650, the sets of insight elements 612 and thecorresponding historical health scores 616 are sent to a health scoremodel 114. At step 655, the health score model 114 may then be trainedto generate a health score when receiving a set of insight elements asinput. In this example, training the health score model 114 includeslabelling each set of insight elements of set of insight element 612with a historical health score 616 and using each set of insightelements and corresponding historical health score as a traininginstance.

At step 660, the document 200 is converted into a tree 204 (e.g., atconversion component 120 of FIG. 1).

At step 665, the tree 204 is parsed with the rules to obtain a set ofresults 618 for the tree 204. Similarly to parsing the set of trees 604,the tree 204 may be parsed by comparing one or more values from the treeto one or more values from other trees. The tree 204 may further beparsed by comparing values within just the tree 204.

At step 670, the set of results 618 is mapped to a set of insightelements 336 (e.g., at mapping component 124 of FIG. 1).

At step 675, the server provides the set of insight elements 336 to thetrained insight generation model 116 in order to generate an insight338. In some embodiments, the set of insight elements 336 may beprovided in an order determined by a ranking model, such as rankingmodel 146 of FIG. 1. Similarly, at step 680, the server may provide theset of insight elements 336 to the trained health score model 114 inorder to generate a health score 410.

At step 685, the insight 338 is generated for the document 200 (e.g., atinsight component 106 of FIG. 1).

Finally, at step 690, the health score 410 is generated for the document200 (e.g., at health score component 104 of FIG. 1). The server mayprovide both the insight 338 and the health score 410 to a computingdevice (e.g., computing device 140 of FIG. 1), for example, based on arequest of the user from the computing device.

Example Method of Generating an Insight and Health Score

FIG. 7 depicts an example method 700 of generating an insight, such asdescribed above with respect to FIGS. 1-6. In some embodiments, method700 may be performed by a processing system, such as server 102 in FIG.1.

Method 700 begins at step 702 with receiving a request to generate aninsight for a document, such as document 200 of FIG. 2A, from a userassociated with the document. The request may be sent by a computingdevice associated with the user, such as computing device 140 of FIG. 1.The document may be associated with an organization.

Method 700 then proceeds to step 704 with receiving the document fromthe computing device.

Method 700 then proceeds to step 706 with generating a tree, such astree 204 of FIGS. 2B and 6, based on document (e.g., by using conversioncomponent 120 of FIG. 1).

Method 700 then proceeds to step 708, with receiving a set of rules andset of insight elements associated with a set of trees including thetree. In some embodiments, the set of trees may include the set of trees604 of FIG. 6. In one embodiment, the set of trees may be converted fromhistorical documents. In another embodiment, the set of rules and set ofinsight elements may be received from a computing device associated withan expert associated with the historical documents (e.g., computingdevice 142 of FIG. 1). In yet another embodiment, the set of rules andset of insight elements may be defined on a server (e.g., server 102 ofFIG. 1). The set of rules and set of insight elements may be stored onthe server (e.g., at rules 128 and insights 130 of FIG. 1,respectively).

Method 700 then proceeds to step 710 with parsing the tree based on theset of rules to generate a set of results, such as the set of results618 of FIG. 6, associated with the tree.

Method 700 then proceeds to step 712 with mapping the set of results toa subset of insight elements associated with the document, such as setof insight elements 336 of FIGS. 3 and 6. The server may map the resultsto the subset of insight elements by applying rules to the set ofresults in order to map each specific result to a specific insightelement of a set of all possible insight elements (e.g., set of insightelements 608 of FIG. 6).

Method 700 then proceeds to step 714 with providing the subset ofinsight elements to a first machine-learning model (e.g., insightgeneration model 116 of FIGS. 1, 3, and 6), which may be trained togenerate an insight, such as insight 338 of FIGS. 3, 5, and 6 whenreceiving a subset of insight elements as input. In some embodiments,the subset of insight elements may also be provided to a secondmachine-learning model (e.g., health score model 114 of FIGS. 1, 4, and6), which may be trained to generate a health score, such as healthscore 410 of FIGS. 4, 5, and 6 when receiving a subset of insightelements as input. The health score may indicate a probability that anorganization associated with the document will be operating during acertain future time period. In some embodiments, the subset of insightelements may be provided to the first machine-learning model in an orderbased on ranking scores determined by a third machine-learning model foreach insight element according to one or more criteria.

In some embodiments, the method 700 may include training the variousmachine-learning models. A set of documents associated with the set ofinsight elements may be received and converted into a set of treesrepresenting the documents. The documents may be historical. The set oftrees may be parsed to obtain a set of results for each tree in the setof trees, where each set of results may be mapped to a respective set ofinsight elements. An insight may be determined for each historicaldocument based on the respective set of insight elements, and each pairof the insight and the respective set of insight elements may be used asa training instance in training the first machine-learning model tooutput an insight when receiving a set of insight elements as input.Similarly, each respective set of insight elements may be labeled with ahistorical health score based on whether an organization associated witha historical document was operating during a certain time period, andeach pair of a historical health score and a respective set of insightelements may be used as a training instance in training the secondmachine-learning model to output a health score when receiving a set ofinsight elements as input.

Method 700 then proceeds to step 716 with receiving the insight for thedocument from the first machine-learning model.

Method 700 then proceeds to step 718 with returning the insight for thedocument to the user. The insight may be returned by being sent to thecomputing device associated with the user, such as computing device 140.In some embodiments, if a health score was generated for the document,the health score may be returned to the computing device associated withthe user, alone or in combination with the insight.

Example Processing Device

FIG. 8 depicts an example processing device 800 that may be configuredto perform the methods described herein, such as method 700 describedwith respect to FIG. 7. In various embodiments, the processing device800 can be a physical processing device, while in other embodiments, theprocessing device may be a virtual (e.g., cloud) processing device(e.g., a virtual machine operating in a cloud service infrastructure).

In this example, processing device 800 includes a central processingunit (CPU) 802 connected to a data bus 812. CPU 802 is configured toprocess computer-executable instructions, e.g., stored in memory 814,and to cause the processing device 800 to perform methods describedherein, for example, with respect to FIG. 7. CPU 802 is included to berepresentative of a single CPU, multiple CPUs, a single CPU havingmultiple processing cores, and other forms of processing architecturecapable of executing computer-executable instructions.

Processing device 800 further includes input/output (I/O) device(s) 808and I/O device interfaces 804, which allows processing device 800 tointerface with input/output devices 808, such as, for example,keyboards, displays, mouse devices, pen input, and other devices thatallow for interaction with processing device 800. Note that processingdevice 800 may connect with external I/O devices through physical andwireless connections (e.g., an external display device).

Processing device 800 further includes a network interface 806, whichprovides processing device 800 with access to external network 810 andthereby external personal devices, such as computing devices 140, 142,and 144 of FIG. 1.

Processing device 800 further includes memory 814, which in this exampleincludes a health score component 816, an insight component 818, aranking component 820, a document analyzer 822, and databases 838.

In the depicted example, document analyzer 822 further includesconversion component 832, parsing component 834, and mapping component836. Conversion component 832 converts documents (e.g., set of documents602 of FIG. 6 and document 200 of FIGS. 2A and 6) received at theprocessing device 800 into trees (e.g., set of trees 604 of FIG. 6 andtree 204 of FIGS. 2B and 6). Further, parsing component 834 parses oneor more trees in order to determine results (e.g., set of results 606and set of results 618 of FIG. 6) associated with those trees byapplying rules to values in the trees. Mapping component 836 maps theresults to insights received at the processing device 800 by applyingone or more rules to the results. Mapping component 836 may then gatherthe mapped insights into a subset of insight elements (e.g., sets ofinsight elements 612 of FIG. 6 or set of insight elements 336 of FIGS. 3and 6) associated with the document.

Health score component 816 further includes vectorizer 824 and healthscore model 826. Vectorizer 824 converts a set of insight elementsassociated with a tree into a vector, where each dimension correspondsto a specific insight in a larger set of insight elements. Health scoremodel 826 may receive the vector created from the set of insightelements and output a health score based on the vector.

Insight component 818 further includes an insight generation model 828.Insight generation model 828 may be trained to receive a set of insightelements and generate an insight as output. Insight generation model 828may further modify, combine, or summarize insights in the set of insightelements while generating the insight.

In this depicted example, ranking component 820 furthers include avectorizer 830, which may create a vector for each insight based onprobabilities associated with that insight, and ranking model 840, whichmay calculate a ranking score for each insight element in a set ofinsight elements. The ranking component 820 may then order the insightsin a set of insight elements based on ranking scores calculated based onthe vector for each insight.

Alternatively, the ranking component may rank the insights based on animportance score received from an external computing device (e.g.,computing device 142 of FIG. 1) without the use of a machine-learningmodel.

Databases 838 may store various files and data for the processing device800. For example, databases may store all received documents in onedatabase, while also storing all rules, insights, trees, health scores,and ranking in their own respective databases.

Note that while shown as a single memory 814 in FIG. 8 for simplicity,the various aspects stored in memory 814 may be stored in differentphysical memories, but all accessible by CPU 802 via internal dataconnections such as bus 812. While not depicted, other aspects may beincluded in memory 814.

Example Clauses

Implementation examples are described in the following numbered clauses:

Clause 1: A method for generating an insight, comprising: receiving arequest to generate an insight for a document from a user associatedwith the document; receiving the document; generating a tree based onthe document; parsing the tree based on a set of rules to generate a setof results associated with the tree; mapping the set of results to asubset of insight elements of a set of insight elements associated witha set of trees including the tree; providing, to a firstmachine-learning model, the subset of insight elements; receiving, fromthe first machine-learning model, an insight for the document based onthe subset of insight elements; and returning the insight for thedocument to the user.

Clause 2: The method of Clause 1, further comprising: providing, to asecond machine-learning model, the subset of insight elements;receiving, from the second machine-learning model, a health score forthe document; and returning the health score for the document to theuser.

Clause 3: The method of any one of Clauses 1-2, further comprising:determining a ranking score for each insight element in the subset ofinsight elements according to one or more criteria; and ordering theinsight elements of the subset of insight elements based on the rankingscore for each insight element, wherein providing the subset of insightelements to the first machine-learning model comprises providing thesubset of insight elements based on the ordering.

Clause 4: The method of Clause 2, wherein the document is associatedwith an organization, wherein the health score for the documentindicates a probability of the organization being in operation during atime period.

Clause 5: The method of any one of Clauses 2-4, further comprising:receiving the set of rules and the set of insight elements associatedwith the set of trees including the tree; receiving a set of historicaldocuments associated with the set of insight elements; converting theset of historical documents into a set of historical trees, wherein theset of trees further comprises the set of historical trees; parsing theset of historical trees to obtain a set of historical results for eachhistorical tree; mapping each set of historical results to a respectivesubset of insight elements; determining an insight for each historicaldocument based on the respective subset of insight elements for each setof historical results; training the first machine-learning model basedon the respective subset of insight elements for each historical treeand the determined insight for each historical document; labeling therespective subset of insight elements for each set of historical resultswith a historical health score; and training the second machine-learningmodel to generate a health score based on each respective subset ofinsight elements.

Clause 6: The method of Clause 5, labeling each subset of insightelements for each set of historical results with a historical healthscore is based on whether an organization associated with eachhistorical document was operating during a first time period; and thehealth score for the document indicates a probability that anorganization associated with the document will be operating after asecond time period.

Clause 7: The method of any one of Clauses 5-6, wherein receiving theset of rules and the set of insight elements comprises receiving the setof rules and set of insight elements from a computing device associatedwith an expert associated with the set of historical documents.

Clause 8: A processing system, comprising: a memory comprisingcomputer-executable instructions; one or more processors configured toexecute the computer-executable instructions and cause the processingsystem to perform a method in accordance with any one of Clauses 1-7.

Clause 9: A processing system, comprising means for performing a methodin accordance with any one of Clauses 1-7.

Clause 10: A non-transitory computer-readable medium comprisingcomputer-executable instructions that, when executed by one or moreprocessors of a processing system, cause the processing system toperform a method in accordance with any one of Clauses 1-7.

Clause 11: A computer program product embodied on a computer-readablestorage medium comprising code for performing a method in accordancewith any one of Clauses 1-7.

Other Considerations

The preceding description is provided to enable any person skilled inthe art to practice the various embodiments described herein. Theexamples discussed herein are not limiting of the scope, applicability,or embodiments set forth in the claims. Various modifications to theseembodiments will be readily apparent to those skilled in the art, andthe generic principles defined herein may be applied to otherembodiments. For example, changes may be made in the function andarrangement of elements discussed without departing from the scope ofthe disclosure. Various examples may omit, substitute, or add variousprocedures or components as appropriate. For instance, the methodsdescribed may be performed in an order different from that described,and various steps may be added, omitted, or combined. Also, featuresdescribed with respect to some examples may be combined in some otherexamples. For example, an apparatus may be implemented or a method maybe practiced using any number of the aspects set forth herein. Inaddition, the scope of the disclosure is intended to cover such anapparatus or method that is practiced using other structure,functionality, or structure and functionality in addition to, or otherthan, the various aspects of the disclosure set forth herein. It shouldbe understood that any aspect of the disclosure disclosed herein may beembodied by one or more elements of a claim.

As used herein, the word “exemplary” means “serving as an example,instance, or illustration.” Any aspect described herein as “exemplary”is not necessarily to be construed as preferred or advantageous overother aspects.

As used herein, a phrase referring to “at least one of” a list of itemsrefers to any combination of those items, including single members. Asan example, “at least one of: a, b, or c” is intended to cover a, b, c,a-b, a-c, b-c, and a-b-c, as well as any combination with multiples ofthe same element (e.g., a-a, a-a-a, a-a-b, a-a-c, a-b-b, a-c-c, b-b,b-b-b, b-b-c, c-c, and c-c-c or any other ordering of a, b, and c).

As used herein, the term “determining” encompasses a wide variety ofactions. For example, “determining” may include calculating, computing,processing, deriving, investigating, looking up (e.g., looking up in atable, a database or another data structure), ascertaining and the like.Also, “determining” may include receiving (e.g., receiving information),accessing (e.g., accessing data in a memory) and the like. Also,“determining” may include resolving, selecting, choosing, establishingand the like.

The methods disclosed herein comprise one or more steps or actions forachieving the methods. The method steps and/or actions may beinterchanged with one another without departing from the scope of theclaims. In other words, unless a specific order of steps or actions isspecified, the order and/or use of specific steps and/or actions may bemodified without departing from the scope of the claims. Further, thevarious operations of methods described above may be performed by anysuitable means capable of performing the corresponding functions. Themeans may include various hardware and/or software component(s) and/ormodule(s), including, but not limited to a circuit, an applicationspecific integrated circuit (ASIC), or processor. Generally, where thereare operations illustrated in figures, those operations may havecorresponding counterpart means-plus-function components with similarnumbering.

The following claims are not intended to be limited to the embodimentsshown herein, but are to be accorded the full scope consistent with thelanguage of the claims. Within a claim, reference to an element in thesingular is not intended to mean “one and only one” unless specificallyso stated, but rather “one or more.” Unless specifically statedotherwise, the term “some” refers to one or more. No claim element is tobe construed under the provisions of 35 U.S.C. § 112(f) unless theelement is expressly recited using the phrase “means for” or, in thecase of a method claim, the element is recited using the phrase “stepfor.” All structural and functional equivalents to the elements of thevarious aspects described throughout this disclosure that are known orlater come to be known to those of ordinary skill in the art areexpressly incorporated herein by reference and are intended to beencompassed by the claims. Moreover, nothing disclosed herein isintended to be dedicated to the public regardless of whether suchdisclosure is explicitly recited in the claims.

What is claimed is:
 1. A method for generating an insight, comprising:receiving a request to generate an insight for a document from a userassociated with the document; receiving the document; generating a treebased on the document; parsing the tree based on a set of rules togenerate a set of results associated with the tree; mapping the set ofresults to a subset of insight elements of a set of insight elementsassociated with a set of trees including the tree; providing, to a firstmachine-learning model, the subset of insight elements; receiving, fromthe first machine-learning model, an insight for the document based onthe subset of insight elements; and returning the insight for thedocument to the user.
 2. The method of claim 1, further comprising:providing, to a second machine-learning model, the subset of insightelements; receiving, from the second machine-learning model, a healthscore for the document; and returning the health score for the documentto the user.
 3. The method of claim 1, further comprising: determining aranking score for each insight element in the subset of insight elementsaccording to one or more criteria; and ordering the insight elements ofthe subset of insight elements based on the ranking score for eachinsight element, wherein providing the subset of insight elements to thefirst machine-learning model comprises providing the subset of insightelements based on the ordering.
 4. The method of claim 2, wherein thedocument is associated with an organization, wherein the health scorefor the document indicates a probability of the organization being inoperation during a time period.
 5. The method of claim 2, furthercomprising: receiving the set of rules and the set of insight elementsassociated with the set of trees including the tree; receiving a set ofhistorical documents associated with the set of insight elements;converting the set of historical documents into a set of historicaltrees, wherein the set of trees further comprises the set of historicaltrees; parsing the set of historical trees to obtain a set of historicalresults for each historical tree; mapping each set of historical resultsto a respective subset of insight elements; determining an insight foreach historical document based on the respective subset of insightelements for each set of historical results; training the firstmachine-learning model based on the respective subset of insightelements for each historical tree and the determined insight for eachhistorical document; labeling the respective subset of insight elementsfor each set of historical results with a historical health score; andtraining the second machine-learning model to generate a health scorebased on each respective subset of insight elements.
 6. The method ofclaim 5, wherein: labeling each subset of insight elements for each setof historical results with a historical health score is based on whetheran organization associated with each historical document was operatingduring a first time period; and the health score for the documentindicates a probability that an organization associated with thedocument will be operating after a second time period.
 7. The method ofclaim 5, wherein receiving the set of rules and the set of insightelements comprises receiving the set of rules and set of insightelements from a computing device associated with an expert associatedwith the set of historical documents.
 8. A system comprising: aprocessor; and a memory storing instructions, which when executed by theprocessor perform a method for generating an insight, comprising:receiving a request to generate an insight for a document from a userassociated with the document; receiving the document; generating a treebased on the document; parsing the tree based on a set of rules togenerate a set of results associated with the tree; mapping the set ofresults to a subset of insight elements of a set of insight elementsassociated with a set of trees including the tree; providing, to a firstmachine-learning model, the subset of insight elements; receiving, fromthe first machine-learning model, an insight for the document based onthe subset of insight elements; and returning the insight for thedocument to the user.
 9. The system of claim 8, the method furthercomprising: providing, to a second machine-learning model, the subset ofinsight elements; receiving, from the second machine-learning model, ahealth score for the document; and returning the health score for thedocument to the user.
 10. The system of claim 8, the method furthercomprising: determining a ranking score for each insight element in thesubset of insight elements according to one or more criteria; andordering the insight elements of the subset of insight elements based onthe ranking score for each insight element, wherein providing the subsetof insight elements to the first machine-learning model comprisesproviding the subset of insight elements based on the ordering.
 11. Thesystem of claim 9, wherein the document is associated with anorganization, wherein the health score for the document indicates aprobability of the organization being in operation during a time period.12. The system of claim 9, the method further comprising: receiving theset of rules and the set of insight elements associated with the set oftrees including the tree; receiving a set of historical documentsassociated with the set of insight elements; converting the set ofhistorical documents into a set of historical trees, wherein the set oftrees further comprises the set of historical trees; parsing the set ofhistorical trees to obtain a set of historical results for eachhistorical tree; mapping each set of historical results to a respectivesubset of insight elements; determining an insight for each historicaldocument based on the respective subset of insight elements for each setof historical results; training the first machine-learning model basedon the respective subset of insight elements for each historical treeand the determined insight for each historical document; labeling therespective subset of insight elements for each set of historical resultswith a historical health score; and training the second machine-learningmodel to generate a health score based on each respective subset ofinsight elements.
 13. The system of claim 12, wherein: labeling eachsubset of insight elements for each set of historical results with ahistorical health score is based on whether an organization associatedwith each historical document was operating during a first time period;and the health score for the document indicates a probability that anorganization associated with the document will be operating after asecond time period.
 14. The system of claim 12, wherein receiving theset of rules and the set of insight elements comprises receiving the setof rules and set of insight elements from a computing device associatedwith an expert associated with the set of historical documents.
 15. Anon-transitory computer-readable medium comprising instructions that,when executed by a processor of a processing system, cause theprocessing system to perform a method for generating an insight,comprising: receiving a request to generate an insight for a documentfrom a user associated with the document; receiving the document;generating a tree based on the document; parsing the tree based on a setof rules to generate a set of results associated with the tree; mappingthe set of results to a subset of insight elements of a set of insightelements associated with a set of trees including the tree; providing,to a first machine-learning model, the subset of insight elements;receiving, from the first machine-learning model, an insight for thedocument based on the subset of insight elements; and returning theinsight for the document to the user.
 16. The non-transitorycomputer-readable medium of claim 15, the method further comprising:providing, to a second machine-learning model, the subset of insightelements; receiving, from the second machine-learning model, a healthscore for the document; and returning the health score for the documentto the user.
 17. The non-transitory computer-readable medium of claim15, the method further comprising: determining a ranking score for eachinsight element in the subset of insight elements according to one ormore criteria; and ordering the insight elements of the subset ofinsight elements based on the ranking score for each insight element,wherein providing the subset of insight elements to the firstmachine-learning model comprises providing the subset of insightelements based on the ordering.
 18. The non-transitory computer-readablemedium of claim 16, wherein the document is associated with anorganization, wherein the health score for the document indicates aprobability of the organization being in operation during a time period.19. The non-transitory computer-readable medium of claim 16, the methodfurther comprising: receiving the set of rules and the set of insightelements associated with the set of trees including the tree; receivinga set of historical documents associated with the set of insightelements; converting the set of historical documents into a set ofhistorical trees, wherein the set of trees further comprises the set ofhistorical trees; parsing the set of historical trees to obtain a set ofhistorical results for each historical tree; mapping each set ofhistorical results to a respective subset of insight elements;determining an insight for each historical document based on therespective subset of insight elements for each set of historicalresults; training the first machine-learning model based on therespective subset of insight elements for each historical tree and thedetermined insight for each historical document; labeling the respectivesubset of insight elements for each set of historical results with ahistorical health score; and training the second machine-learning modelto generate a health score based on each respective subset of insightelements.
 20. The non-transitory computer-readable medium of claim 19,wherein: labeling each subset of insight elements for each set ofhistorical results with a historical health score is based on whether anorganization associated with each historical document was operatingduring a first time period; and the health score for the documentindicates a probability that an organization associated with thedocument will be operating after a second time period.